25 research outputs found

    Reliable and Real-Time Distributed Abstractions

    Get PDF
    The celebrated distributed computing approach for building systems and services using multiple machines continues to expand to new domains. Computation devices nowadays have additional sensing and communication capabilities, while becoming, at the same time, cheaper, faster and more pervasive. Consequently, areas like industrial control, smart grids and sensor networks are increasingly using such devices to control and coordinate system operations. However, compared to classic distributed systems, such real-world physical systems have different needs, e.g., real-time and energy efficiency requirements. Moreover, constraints that govern communication are also different. Networks become susceptible to inevitable random losses, especially when utilizing wireless and power line communication. This thesis investigates how to build various fundamental distributed computing abstractions (services) given the limitations, the performance and the application requirements and constraints of real-world control, smart grid and sensor systems. In quest of completeness, we discuss four distributed abstractions starting from the level of network links all the way up to the application level. At the link level, we show how to build an energy-efficient reliable communication service. This is especially important for devices with battery-powered wireless adapters where recharging might be unfeasible. We establish transmission policies that can be used by processes to decide when to transmit over the network in order to avoid losses and minimize re-transmissions. These policies allow messages to be reliably transmitted with minimum transmission energy. One level higher than links is failure detection, a software abstraction that relies on communication for identifying process crashes. We prove impossibility results concerning implementing classic eventual failure detectors in networks with probabilistic losses. We define a new implementable type of failure detectors, which preserves modularity. This means that existing deterministic algorithms using eventual failure detectors can still be used to solve certain distributed problems in lossy networks: we simply replace the existing failure detector with the one we define. Using failure detectors, processes might get information about failures at different times. However, to ensure dependability, environments such as distributed control systems (DCSs), require a membership service where processes agree about failures in real time. We prove that the necessary properties of this membership cannot be implemented deterministically, given probabilistic losses. We propose an algorithm that satisfies these properties, with high probability. We show analytically, as well as experimentally (within an industrial DCS), that our technique significantly enhances the DCS dependability, compared to classic membership services, at low additional cost. Finally, we investigate a real-time shared memory abstraction, which vastly simplifies programming control applications. We study the feasibility of implementing such an abstraction within DCSs, showing the impossibility of this task using traditional algorithms that are built on top of existing software blocks like failure detectors. We propose an approach that circumvents this impossibility by attaching information to the failure detection messages, analyze the performance of our technique and showcase ways of adapting it to various application needs and workloads

    You Only Live Multiple Times: A Blackbox Solution for Reusing Crash-Stop Algorithms In Realistic Crash-Recovery Settings

    Get PDF
    Distributed agreement-based algorithms are often specified in a crash-stop asynchronous model augmented by Chandra and Toueg\u27s unreliable failure detectors. In such models, correct nodes stay up forever, incorrect nodes eventually crash and remain down forever, and failure detectors behave correctly forever eventually, However, in reality, nodes as well as communication links both crash and recover without deterministic guarantees to remain in some state forever. In this paper, we capture this realistic temporary and probabilitic behaviour in a simple new system model. Moreover, we identify a large algorithm class for which we devise a property-preserving transformation. Using this transformation, many algorithms written for the asynchronous crash-stop model run correctly and unchanged in real systems

    RT-ByzCast: Byzantine-Resilient Real-Time Reliable Broadcast

    Get PDF
    Today’s cyber-physical systems face various impediments to achieving their intended goals, namely, communication uncertainties and faults, relative to the increased integration of networked and wireless devices, hinder the synchronism needed to meet real-time deadlines. Moreover, being critical, these systems are also exposed to significant security threats. This threat combination increases the risk of physical damage. This paper addresses these problems by studying how to build the first real-time Byzantine reliable broadcast protocol (RTBRB) tolerating network uncertainties, faults, and attacks. Previous literature describes either real-time reliable broadcast protocols, or asynchronous (non real-time) Byzantine ones. We first prove that it is impossible to implement RTBRB using traditional distributed computing paradigms, e.g., where the error/failure detection mechanisms of processes are decoupled from the broadcast algorithm itself, even with the help of the most powerful failure detectors. We circumvent this impossibility by proposing RT-ByzCast, an algorithm based on aggregating digital signatures in a sliding time-window and on empowering processes with self-crashing capabilities to mask and bound losses. We show that RT-ByzCast (i) operates in real-time by proving that messages broadcast by correct processes are delivered within a known bounded delay, and (ii) is reliable by demonstrating that correct processes using our algorithm crash themselves with a negligible probability, even with message loss rates as high as 60%

    Qualitative Analysis for Validating IEC 62443-4-2 Requirements in DevSecOps

    Full text link
    Validation of conformance to cybersecurity standards for industrial automation and control systems is an expensive and time consuming process which can delay the time to market. It is therefore crucial to introduce conformance validation stages into the continuous integration/continuous delivery pipeline of products. However, designing such conformance validation in an automated fashion is a highly non-trivial task that requires expert knowledge and depends upon the available security tools, ease of integration into the DevOps pipeline, as well as support for IT and OT interfaces and protocols. This paper addresses the aforementioned problem focusing on the automated validation of ISA/IEC 62443-4-2 standard component requirements. We present an extensive qualitative analysis of the standard requirements and the current tooling landscape to perform validation. Our analysis demonstrates the coverage established by the currently available tools and sheds light on current gaps to achieve full automation and coverage. Furthermore, we showcase for every component requirement where in the CI/CD pipeline stage it is recommended to test it and the tools to do so

    Facing the Safety-Security Gap in RTES: the Challenge of Timeliness

    Get PDF
    Safety-critical real-time systems, including real-time cyber-physical and industrial control systems, need not be solely correct but also timely. Untimely (stale) results may have severe consequences that could render the control system’s behaviour hazardous to the physical world. To ensure predictability and timeliness, developers follow a rigorous process, which essentially ensures real-time properties a priori, in all but the most unlikely combinations of circumstances. However, we have seen the complexity of both real-time applications, and the environments they run on, increase. If this is matched with the also increasing sophistication of attacks mounted to RTES systems, the case for ensuring both safety and security through aprioristic predictability loses traction, and presents an opportunity, which we take in this paper, for discussing current practices of critical realtime system design. To this end, with a slant on low-level task scheduling, we first investigate the challenges and opportunities for anticipating successful attacks on real-time systems. Then, we propose ways for adapting traditional fault- and intrusiontolerant mechanisms to tolerate such hazards. We found that tasks which typically execute as analyzed under accidental faults, may exhibit fundamentally different behavior when compromised by malicious attacks, even with interference enforcement in place

    Right On Time Distributed Shared Memory

    Get PDF
    The demand for real-time data storage in distributed control systems (DCSs) is growing. Yet, providing real- time DCS guarantees is challenging, especially when more and more sensor and actuator devices are connected to industrial plants and message loss needs to be taken into account. In this paper, we investigate how to build a shared memory abstraction for DCSs as a first step towards implementing different shared storage systems in a DCS context. We first prove that, in the presence of host crashes and message losses, the necessary guarantees of such an abstraction are impossible to implement using a traditional approach that has no access to the internals of existing DCS services, e.g., a modular approach where algorithms are built on top of existing software blocks like failure detectors. We propose a white-box approach that utilizes messages of existing services in any DCS as the sole means of communication. More precisely, we present TapeWorm, an algorithm that attaches itself to the heartbeat messages of the failure detector component in DCSs. We prove that TapeWorm implements the desired shared memory guarantees for applications running on a DCS. We also analyze the performance of TapeWorm and we showcase ways of adapting TapeWorm to various application needs and workloads

    RepuCoin: Your Reputation is Your Power

    Get PDF
    Existing proof-of-work cryptocurrencies cannot tolerate attackers controlling more than 50% of the network’s computing power at any time, but assume that such a condition happening is “unlikely”. However, recent attack sophistication, e.g., where attackers can rent mining capacity to obtain a majority of computing power temporarily, render this assumption unrealistic. This paper proposes RepuCoin, the first system to provide guarantees even when more than 50% of the system’s computing power is temporarily dominated by an attacker. RepuCoin physically limits the rate of voting power growth of the entire system. In particular, RepuCoin defines a miner’s power by its ‘reputation’, as a function of its work integrated over the time of the entire blockchain, rather than through instantaneous computing power, which can be obtained relatively quickly and/or temporarily. As an example, after a single year of operation, RepuCoin can tolerate attacks compromising 51% of the network’s computing resources, even if such power stays maliciously seized for almost a whole year. Moreover, RepuCoin provides better resilience to known attacks, compared to existing proof-of-work systems, while achieving a high throughput of 10000 transactions per second (TPS)

    Byzantine Resilient Protocol for the IoT

    Get PDF
    Wireless sensor networks, often adhering to a single gateway architecture, constitute the communication backbone for many modern cyber-physical systems. Consequently, faulttolerance in CPS becomes a challenging task, especially when accounting for failures (potentially malicious) that incapacitate the gateway or disrupt the nodes-gateway communication, not to mention the energy, timeliness, and security constraints demanded by CPS domains. This paper aims at ameliorating the fault-tolerance of WSN based CPS to increase system and data availability. To this end, we propose a replicated gateway architecture augmented with energy-efficient real-time Byzantineresilient data communication protocols. At the sensors level, we introduce FT-TSTP, a geographic routing protocol capable of delivering messages in an energy-efficient and timely manner to multiple gateways, even in the presence of voids caused by faulty and malicious sensor nodes. At the gateway level, we propose a multi-gateway synchronization protocol, which we call ByzCast, that delivers timely correct data to CPS applications, despite the failure or maliciousness of a number of gateways. We show, through extensive simulations, that our protocols provide better system robustness yielding an increased system and data availability while meeting CPS energy, timeliness, and security demands

    RepuCoin: Your Reputation is Your Power

    Get PDF
    Existing proof-of-work cryptocurrencies cannot tolerate attackers controlling more than 50% of the network’s computing power at any time, but assume that such a condition happening is “unlikely”. However, recent attack sophistication, e.g., where attackers can rent mining capacity to obtain a majority of computing power temporarily, render this assumption unrealistic. This paper proposes RepuCoin, the first system to provide guarantees even when more than 50% of the system’s computing power is temporarily dominated by an attacker. RepuCoin physically limits the rate of voting power growth of the entire system. In particular, RepuCoin defines a miner’s power by its ‘reputation’, as a function of its work integrated over the time of the entire blockchain, rather than through instantaneous computing power, which can be obtained relatively quickly and/or temporarily. As an example, after a single year of operation, RepuCoin can tolerate attacks compromising 51% of the network’s computing resources, even if such power stays maliciously seized for almost a whole year. Moreover, RepuCoin provides better resilience to known attacks, compared to existing proof-of-work systems, while achieving a high throughput of 10000 transactions per second (TPS)
    corecore